Tag

#AI optimization

29 articles

A Coding Guide to NVIDIA’s Tile-Based GPU Programming: From cuTile and Triton Kernels to Flash Attention

This article explains tile-based GPU programming concepts, focusing on NVIDIA's cuTile and Triton frameworks, and how they enable efficient Flash Attention in large language models.

Jul 1118

tech

Meet Nemotron Labs 3 Puzzle 75B A9B: A Compressed Hybrid MoE LLM Delivering 2.03x Server Throughput

NVIDIA introduces Nemotron-Labs-3-Puzzle-75B-A9B, a compressed hybrid MoE LLM delivering 2.03x server throughput, leveraging hardware-aware compression and knowledge distillation.

Jul 929

tech

I've been reviewing laptops for years: These are the 15+ best July 4th laptop deals

This article explains how artificial intelligence and machine learning optimize retail pricing and promotional strategies, using laptop sales as an example of sophisticated data-driven decision making.

Jul 219

Meta's non-invasive brain-to-text AI is closing the gap with surgical implants

Meta's new non-invasive brain-to-text AI system, Brain2Qwerty v2, translates brain activity into typed sentences without requiring surgery. The technology is advancing rapidly, with AI optimization playing a key role.

Jul 133

OpenAI reportedly cut response costs for guest ChatGPT users by more than half

OpenAI has reportedly cut inference costs for its AI models by more than half, significantly reducing the number of GPUs needed to process ChatGPT responses.

Jun 3036

DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel for Up to 15x Higher Throughput on NVIDIA Blackwell

Researchers at UC San Diego introduce DFlash, a new speculative decoding technique that drafts whole token blocks in parallel, achieving up to 15x throughput improvement on NVIDIA Blackwell.

Jun 2354

Cisco AI Introduces FAPO: Pipeline-Aware Prompt Optimization With Step-Level Failure Attribution and Claude Code Orchestration

Learn how FAPO, a new AI tool from Cisco, automatically improves AI prompts by analyzing each step of a task to make AI systems more accurate and reliable.

Jun 2052

The KV Cache Compression Race: TurboQuant vs OSCAR vs EpiCache

As KV cache memory outpaces model weights in large language models, three compression techniques—TurboQuant, OSCAR, and EpiCache—are emerging as key contenders. While each offers distinct methods for optimization, they are seen as complementary rather than competitive.

Jun 1853

Microsoft's SkillOpt boosts GPT-5.5 by using nothing but a trained Markdown file

Learn how to create and apply SkillOpt Markdown files to dramatically improve AI agent performance on procedural tasks, boosting models like GPT-5.5 by 23 points.

Jun 1356

Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation

Researchers introduce GEPA, a reflective prompt-evolution framework that enhances small language models' performance on multi-step arithmetic problems through structured feedback and multi-component prompt design.

Jun 751

tech

I never use a new iPhone until I change these settings - why they're such a big deal

This explainer explores how AI-driven smartphone optimization works and why specific settings significantly impact system performance and user experience. It covers machine learning architectures, data collection, and privacy implications.

Jun 764

Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory

This article explains how Google DeepMind's Gemma 4 QAT checkpoints, particularly the Q4_0 and mobile formats, optimize large language models for edge deployment by reducing memory usage and computational requirements through advanced quantization techniques.

Jun 530